首页手记 Python爬虫批量下载电影链接

Python爬虫批量下载电影链接

标签：

Python

接下来很容易就解析出电影的ftp下载链接和磁力链接：

image

理论部分讲解完成后，接下来的Python实现代码如下：

# -*- coding:utf-8 -*-import urllibimport urllib2import reimport requestsimport timeimport requestsimport requests_cache# User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64)# AppleWebKit/537.36 (KHTML, like Gecko)# Chrome/65.0.3325.181 Safari/537.36 OPR/52.0.2871.64requests_cache.install_cache('demo_cache')global fp
url = 'https://www.dy2018.com/html/gndy/dyzz/index.html'# url = 'https://www.dy2018.com/i/99901.html'user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'headers = {'User-Agent': user_agent}try:
    r = requests.get(url)    print type(r)    print r.status_code    print r.encoding

    html = requests.get(url, headers=headers).text
    html = html.encode(r.encoding)
    html = html.decode("gbk")
    content = html    # print content

    fp = open(unicode("temp_pachong.txt", 'utf-8'), 'w')  # 文件名不乱码
    fp.write(content.encode('utf-8'))
    fp.close()    # <a href="/i/99901.html" class="ulink" title="2018年美国7.6分恐怖片《遗传厄运》BD中英双字">2018年美国7.6分恐怖片《遗传厄运》BD中英双字</a>
    pattern = re.compile('<b>.*?<a href="/i/(.*?).html" class="ulink" title="(.*?)">.*?</a>.*?</b>', re.S)
    items = re.findall(pattern, content)

    fp = open(unicode("电影天堂爬虫.txt",'utf-8'),'w')  # 文件名不乱码
    localtime=time.strftime('%Y-%m-%d-%H：%M：%S', time.localtime(time.time()))
    count=0
    fp.write("********************" + localtime +"********************".encode('utf-8') + '\n')    print '本页总资源数为：' + str(len(items))    for item in items:
        count=count+1
        temp=str(count) + ":  " + item[1]        print temp
        fp.write(temp.encode('utf-8')+ '\n')
        temp='https://www.dy2018.com/i/' + item[0] + '.html'
        print temp        #获取下载链接
        url = temp
        r = requests.get(url)
        user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
        headers = {'User-Agent': user_agent}
        html = requests.get(url, headers=headers).text
        html = html.encode(r.encoding)
        html = html.decode("gbk")
        content = html        # print content
        link_temp = re.compile('<td style=".*?"><a href="(.*?)">.*?</a></td>', re.S)
        link = re.findall(link_temp, content)        print link[0]
        fp.write(link[0].encode('utf-8') + '\n')

    fp.write("********************" + localtime +"********************".encode('utf-8'))
    fp.close()except urllib2.URLError, e:    if hasattr(e, "code"):        print e.code    if hasattr(e, "reason"):        print e.reason
    fp.close()

实际效果如下：

view-source_https____i_99901.html.png

作者：看星星的天空
链接：https://www.jianshu.com/p/e9b5518bcdae

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

幕布斯6054654

手记
篇

粉丝

218

获赞与收藏

1009

关注作者，订阅最新文章

阅读免费教程

Python 办公自动化教程

17个小节 24347 820

Python 算法入门教程

15个小节 25759 1013

Python 进阶应用教程

38个小节 61405 957

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

Python爬虫批量下载电影链接

阅读免费教程